A hybrid approach to compounds in LVCSR

نویسندگان

  • Tom Laureys
  • Vincent Vandeghinste
  • Jacques Duchateau
چکیده

In several languages compound words form orthographic units, which complicates the task of ensuring good lexical coverage for large vocabulary continuous speech recognition (LVCSR). A common approach to the problem consists of first recognizing the compound constituents, followed by an automatic recompounding process. We describe an accurate compound module, which combines a rule-based approach with statistical pruning. The module is incorporated in a broadcast news recognition task for Dutch and yields an 11% relative decrease in word error rate (WER).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid language model for open-vocabulary Thai LVCSR

This paper investigates the use of a hybrid language model for open-vocabulary Thai LVCSR. Thai text is written without word boundary markers and the definition of word unit is often ambiguous due to the presence of compound words. Hence, to build open-vocabulary LVCSR, a very large lexicon is required to also handle word unit ambiguity. Pseudomorpheme (PM), a syllable-like sub-word unit specif...

متن کامل

Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR

For languages like German and Polish, higher numbers of word inflections lead to high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Thus, one of the main challenges in large vocabulary continuous speech recognition (LVCSR) is recognizing an open vocabulary. In this paper, we investigate the use of mixed type of sub-word units in the same recognition lexicon. Namely, m...

متن کامل

Efficient search using posterior phone probability estimates

In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small l...

متن کامل

Tied posteriors: an approach for effective introduction of context dependency in hybrid NN/HMM LVCSR

This papers presents a method to improve the recognition rate of hybrid connectionist/HMM speech recognition systems. At the same time this approach allows the easy introduction of context dependent models in the hybrid framework. The approach is based on a standard hybrid connectionist/HMM recognizer, in which the neural nets are trained to estimate the a posteriori probabilities for all phone...

متن کامل

Combining multiple-type input units using recurrent neural network for LVCSR language modeling

In this paper, we investigate the use of a Recurrent Neural Network (RNN) in combining hybrid input types, namely word and pseudo-morpheme (PM) for Thai LVCSR language modeling. Similar to other neural network frameworks, there is no restriction on RNN input types. To exploit this advantage, the input vector of a proposed hybrid RNN language model (RNNLM) is a concatenated vector of word and PM...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002